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1 Introduction 

Alternating direction methods in convex optimization has regained popularity due to their 
ability to decentralize data and distribute computation in large-scale problems. The under¬ 
lying theory for the classical alternating direction methods, such as the alternating direction 
method of multipliers (ADMM) or the alternating minimization algorithm (AMA), is mature 
as they have their roots from the splitting methods in monotone inclusions and other clas¬ 
sical approaches, such as Douglas-Rachford splitting, Dykstra projections, and Hauzageau’s 

methods m- 
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Unfortunately, the global convergence rate guarantees for existing methods do not provide 
a satisfactory answer for the following simple but representative convex problem: 


d* ■■= mm {d{A) := dci ('*') + (''')} . 


( 1 ) 


where Ci and C 2 are nonempty, closed, and convex sets, and is the Euclidean distance to 
the set C. Note that when Ci nC 2 ^ 0, 0 returns a solution at the intersection of the two sets. 
While the existing algorithms provide strong and weak convergence of their iterates for 0 
under different structural assumptions on Ci and C 2 , to our best knowledge, none provides an 
approximate solution to the objective value with rigorously global convergence rate guarantees 
with a mild set of assumptions. In addition, depending the geometry of the sets Ci and C 2 , 
existing methods can exhibit arbitrarily slow convergence. 

As a generalization of 0 in the dual setting, this paper studies the following constrained 
convex optimization template in the primal setting: 


min I fix) := q(u) + h(v)\, 
s.t. Au + Bv = c. 


( 2 ) 


where g : —>■ M U {+ 00 } and h : - 

functions, A £ B £ and c G 

follows, which covers 0 as a special case: 


R U {+ 00 } are two proper, closed and convex 
are given. The dual of 0 can be written as 


d* := min {d(A) := g* {A'^X) + /i*(b'^A) + (c, A)}, 


(3) 


where g* and h* are the Fenchel conjugates [5H] of g and h, respectively; d is the dual func¬ 
tion; A is the dual variable; and d* denotes the dual optimal value. The convex template 0 
also manifests itself when we apply convex splitting techniques to decompose the composite 
objective / into two terms g and h that are coupled via linear constraints. It can also include 
convex constraints on u and v. 

This paper develops a new primal-dual algorithmic framework for 0 which processes 
g and h in an alternating fashion to obtain numerical solutions to 0. This strategy often 
provides computational advantages as compared to processing both terms jointly. The resulting 
algorithms can be classified as alternating direction methods. While various solution methods 
along this line for solving 0 can be found in the literature, see, e.g., [H[6ll8l[^ ll01lllU131ll51ll71 
IT8llT9ll^[2^[26l[2mi30ll3iMfalM| , our algorithms developed in this paper are new compared 
to those. In particular, we require a mild set of assumptions on / and g, while can achieve the 
best known global convergence rate on the primal problem 0 as well as on the dual one 0. 
In addition, our methods do not require any parameter tuning. A more thorough discussion 
on these differences is postponed to Section]^ 

Our contributions: Our main contributions can be summarized as follows: 

(a) {Theory) We introduce a split-gap reduction technique as a new framework for deriving new 
alternating direction methods. Our framework unifies the model-based gap reduction tech¬ 
nique of [32], smoothing techniques, and the powerful forward-backward and Douglas- 
Rachford splitting techniques. We establish explicit relations between primal weighting 
strategy, the parameter choices, and the global convergence rate of the algorithms in our 
framework. 

{Algorithms and convergence guarantees) We propose two new smoothing alternating direc¬ 
tion optimization algorithms: smoothing alternating minimization algorithm (SAMA), and 
smoothing alternating direction method of multipliers (SADMM). We derive update rules 


(b) 
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for all the algorithmic parameters including the penalty parameters in a heuristic-free fash¬ 
ion. We rigorously characterize the convergence rate of our algorithms for both the objective 
residual f{x^) — f* and the feasibility gap \\Au^ + Bv^ — c||. To our best knowledge, this 
is the best known global convergence rate that can be achieved under mildest assumptions 
in the literature. 

(d) {Special cases) We also illustrate that our technique can exploit additional assumptions on 
A or B, g and h, whenever they are available. 

We would like to emphasize the following key points of our framework. Except for convexity, 
the only assumption we use is the boundedness of the domain diameter of dom {g) and dom {h). 
This assumption is only required to obtain the worst-case complexity bounds, but it do not use 
at any step of our algorithms. In particular, this assumption can be guaranteed by assuming 
that g and h are Lipschitz continuous [2]. We argue that these conditions are indeed mild, 
and enables us to provide rate guarantees for a wider class of constrained convex problems 
While our algorithms aim at solving nonsmooth problems such as feasibility problems, 
smooth convex applications without Lipschitz gradient objectives such as Poisson imaging, 
graphical model learning, and RPCA problems can also be solved using our methods. Our 
algorithms are heuristic-free in the sense that we update all the parameters automatically at 
each iteration including the so-called penalty parameter in alternating direction methods [3l 
mm- This solves the major drawback in augmented Lagrangian-based methods. We ague 
that this key feature is important in parallel and distributed implementation, when tuning 
parameters is impossible to cary out. 

Intriguingly, our algorithms update their penalty parameters in a decreasing fashion in stark 
contrast to the classical algorithms. Computationally, the arithmetic cost per-iteration of our 
algorithms is fundamentally the same as the classical AMA and ADMM methods. Finally, we 
can explicitly show how the choice of the algorithmic parameters can trade-off the convergence 
guarantee in the objective residual f{x^) — /* and the primal feasibility gap \\Au^ -|- Bv^ — c|| 
in the worst case. 

Paper organization: Section briefly presents a primal-dual formulation of problem § under 
basic assumptions, and characterizes its optimality condition. Section]^ deals with a smoothing 
technique for the primal-dual gap function. Section presents a smoothing AMA algorithm 
and analyzes its convergence. The strongly convex case is also studied in this section. Section 
is devoted to developing a smoothing ADMM algorithm and analyzes its convergence. Section 
1^ presents numerical experiments to verify the performance of our algorithms. We conclude 
with a discussion of our results in the context of existing work. For clarity of exposition, several 
technical and new proofs are moved to the appendix. 

Notation: In the sequel, we refer to § as the primal problem. We work on the real spaces 
and M", endowed with the inner product {x, X) and the standard Euclidean norm |j • ||. We use 
the superscript T for both the transpose and adjoint operators. Eor a convex function /, we 
use df for its subdifferential, and f* for its Eenchel conjugate. Eor a convex set X, we use 3x 
for its indicator function, and ri(A’) for its relative interior. We also use R++ for the set of 
positive real numbers. 

Eor any proper, closed and convex function ip : > R U {-foo}, the proximal operator is 

defined as follows: 

prox,^(a;) := argmm|(p( 2 ) -t- {l/2)\\z - x\\^'^ . (4) 

Generally, computing prox,^ is intractable. However, if prox,^ can be computed in a closed form 
efficiently or in polynomial time, then we say that g} has a tractable proximity operator. Several 
examples can be found, e.g., in wm- 
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2 Preliminaries: Lagrangian primal-dual formulation 

This section briefly describes the primal-dual formulation of § and our fundamental assump¬ 
tions. 

2.1 The dual problem 

Let X := (m, f) = be the primal variable, dom(/) := dom((/) x dom(/i), 

and V := {(w, v) G dom {/) : Au + Bv = c} be the feasible set of (§. We define the Lagrange 
function of § associated with Au + Bv = c as C{x, A) := g{u) + h{v) — (A, Au + Bv — c), where 
A G R" is the Lagrange multiplier. We recall the dual problem § of (H) here: 

d* := min |d(A) := max | (T^A, u) — + max |{B^A, v) — /i(u)| — c^A| , (5) 

where d is the dual function, with two terms can be computed individually as 

VJ(A) := max |(4^A,u)-sr(M)| =g*{A^\), 

^ UGdom(p) I ) /gN 

V'(A) := max ^^{B^X,v) — h{v)'^ — \ = h*{B^X) — X. 

Let us denote by u*(X) and t)*(A) one solution of these subproblems, respectively if they exist. 
In this case, using the optimality condition, we have A^X G dg{u*{X)), which is equivalent to 
u*(A) G dg*{A^X). Similarly, B^A G dg{v*{X)), which is equivalent to v*{X) G dh*{B^X). These 
dual components are convex, but generally nonsmooth. Subgradient or bundle-type methods 
for directly solving § are generally inefficient [23M21. 

2.2 Our assumptions 

Let us denote by X* the solution set of <§• We say that the Slater condition holds for § if 
we have 

ri(dom (/)) n {(u, u) G R^ : Au + Bv = c} 0, (7) 

where ri(A’) is the relative interior of X (see [28] L 

For the primal-dual pair we require the following assumption: 

Assumption A. 1 The functions g and h are proper, closed, and convex. The solution set X* of 
§ is nonempty. Either dom (/) is polyhedral or the Slater condition 0 holds. 


2.3 Zero duality gap 

Under Assumption A[^ the solution set A* of the dual problem 0 is nonempty and bounded. 
Moreover, strong duality holds, i.e., /* + d* = 0. From the classical duality theory, we have 
f{x) + d{X) > 0 for any feasible primal-dual point {x,X). Hence, the duality gap function G 
defined by 

G{w) := f{x)+d{X)>0, V® G T>, VA G R”, (8) 

where w := (a;, A). Clearly, G{w*) = 0 (zero duality gap) for any primal-dual solution w* := 
(a:*, A*) G X* X A*. In addition, w* is a saddle point of the Lagrange function; that is C{x*, A) < 
iC{x*,X*) = /* = —d* < C{x,X*) for all x G dom(/) and A G R". The optimality condition of 
0 can be written as 

Au* + Bv* = c, A^X* G dg{u*), and B^A* G dh{v*). 


( 9 ) 
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3 Smoothing the primal-dual gap function 

The dual function d defined by ([^ is convex, but it is generally nonsmooth. Our key idea is 
to replace the component g* in 1 m , with a new smoothed approximation g* to derive new 
algorithms. 

Let us consider the domain U := dom (g) of g. Associated with U, we choose a proximity 
function uo, i.e., u) is continuous, and strongly convex with the convexity parameter = 1 > 0 , 
and U C dom (a;). In addition, we assume that a; is smooth, and its gradient is Lipschitz 
continuous with the Lipschitz constant Luj G [0,+oo). 

Given oi, we dehne a Bregman distance big{u,u) := oj(u) — lo{u) — {\7uj{u),u — u). We hx 
if £ doni(g) and consider the function Clearly, is smooth and strongly 

convex with the convexity parameter p;, = = 1. Its gradient vf) = Va;(-) — Vuj{xf) is 

Lipschitz continuous with the Lipschitz constant = Luj > = 1. In addition, hK(if,if) = 0 

and V\bii{if ,if) = 0. 

Given the Bregman distance by, if £ W, and the conjugate g* of g, we dehne 

gif{z) := max {{z,u} - g{u) - ybu(u,u‘')} , ( 10 ) 


where 7 > 0 is a smoothness parameter. We denote by the solution of the maximization 

problem in ( 10 ), i.e.: 

u^(z) := arg max {{z,u) - g{u) - ^bu{u,u’^)} , ( 11 ) 


which is well-dehned and unique. Clearly, \7g*(z) = Uj{z) is the gradient of g^, which is (I/ 7 )- 
Lipschitz gradient. Hence, gi^ is (l/ 7 )-smooth. We dehne 


Df\= sup {II Au + Bv — c|| ■. u £ dom [g ), v £ dom (h)} . (12) 

U,V 


Let g^ and ip be dehned by (10) and (Im, respectively, and /? > 0. We consider 


'd^(A) := 5 ;(A^A)+ (h*(H^A)-(c,A)) =^^(A)+V-(A), 

‘ fp{x) := g{u) + h{v) + ^\\Au + Bv - c\\‘^, 

. G't,/3(w) := fp{x)+dj{X). 


(13) 


If 7 I 0^, then we have d-y{X) —>• d(A). Hence, d-y is a smoothed approximation of d, but it is 
not fully smooth. For any feasible point x = {u,v) £ V, we have fi 3 {x) = f{x). Here, can be 
considered as an approximation to / near the feasible set V. Hence, the smoothed gap function 
is an approximation of the duality gap function G(-) in ([^. Moreover, the smoothed 
gap function G-y^(-) is convex. The following lemma shows us how to use to characterize 
the primal-dual solutions for (§-§, whose proof is in Appendix 

Lemma 1 For any x^ := {u^,v^) £ dom (/) and X^ £ R", it holds that 

- ||A*|| II + Bv'^ - c|| < fix’^) -f*< f{x'‘) + d(x'^). (14) 

Let {w^} be an arbitrary sequence in dom (/) x R” and {(7fc,/3fc)} be a sequence in R++. Then, 
the following estimates hold: 

'/(*'=)-/* <Sk{w>^), 

< ||Au'= + Bv’^ - c|| < 2/3fc||A*|| + ^2PkSk{w^), (15) 

_ d{X^) -d* < 2/3fc||A*|| + ||A*||V2/3feSfc(«)fc) + Sk{w'^), 

where Si;{w^) := Gj,,p,,{w^) + ykbu{u*which requires the values ofGjp. 


8.1 
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Computing exactly a primal-dual solution (a:*, A*) is impractical. Hence, our objective is 
to find an approximation {x^,X^) to (a;*, A*). We use the following sense: 

Definition 1 Given an accuracy e > 0, a primal-dual point (a;^,A^) G dom{/) x R" is said to 
be an e-solution of (§-(§ if 

f{x^) — /* < e, ||Hu^ -|- Bv^ — c|| < e, and d{\^) — d* < e. 


We use the same accuracy parameter t for each of these terms for simplicity. 


We note that by combining \\Au^ + Bv^ — c\\ 


< £ and (141, we can guarantee a lower abound 
f{x^) — f* > —||A*||e. In addition, the domain dom(/) is usually simple (e.g., box, ball, conic 
cone, or simplex) so that the constraint x^ € dom (/) can be guaranteed via a closed form 
projection onto dom(/). 

The goal is to generate a primal-dual sequence and a parameter sequence {( 7 ^, /3fe)} in 
Lemma 1 such that (w^)} converges to 0 and {{ 7 ^, /3fe)} also converges to zero. Moreover, 

the convergence rate of f{x^) — /* and \\Au^ Bv^ — c|| depends on the convergence rate of 
{G' 7 ,/ 3 ,(w'')} and {(7fc,^fc)}- 


4 Smoothing Alternating Minimization Algorithm (SAMA) 


We propose a new alternating direction method via the application of the accelerated forward- 
backward splitting to the smoothed gap function. We describe |S AM A| in three subsections: the 
core steps, the initialization, and the parameter updates. 


4.1 The core steps 

At the iteration k > 0, given A^ G R" and the parameters 7^+1 > 0 and > 0, the core steps 
of our |SAMA| consists of two primal alternating direction steps and one dual ascend step as 
follows: 


:= argmin {g{u) - {A^\^,u) A ^k+i'bu{u,u^)], 

uGdom(p) 

:= argmin ihiv) — {b'^X^, v)+ ^\\Au^'^^ + Bv — c\\^\, (SAMA) 

DGdoin(?l) ^ 2 

yfc+i := a'^ - + Bv’^+^ - c). 


where and 7 ^ are referred to as the smoothness and the penalty parameter, respectively, 
and Uc is a chosen center point. 

The subproblems in |SAMA| can be often computed in a closed form. Let us describe two 
cases. First, if hu{-,xf) := (1/2)|| ■ —the standard Euclidean distance, then computing 
reduces to computing the proximal operator of g, i.e., 

=prox 1 (wc-7fc+i^ ^ )■ 


Second, if we have H = I or B is orthonormal, then computing -0^+^ reduces to computing the 
proximal operator of h, i.e.. 


= prox. 


Vk 




{B^{c-Au^+^) + V^^B^X>^). 


By inspection, it is easy see that |SAMA is an analog of the classical AM A (cf., (HI)- 
The first subproblem, due to (10), corresponds to the forward step while t he last t wo lines 
correspond to the backward step. Moreover, if we set = 0 and A^+^ = A^“*"^, 


SAMA 


becomes 
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the AMA. However, in contrast to the AMA, the |SAMA| also features a dual acceleration and 
a primal weighted averaging step: 




■= (1 - rk)X'" + TkXl, 


(dual acceleration) 


) := (1 — rfc)(u^, -3^“'"^), (weighted averaging) 


(16) 


where := Pf, ^(c — Au^ — Bv^), and G ( 0 , 1 ) is a given step size. 

The following lemma provides conditions showing that the sequence {(a;^,A^)} generated 
by (SAMA|-(16) maintains the non-monotone gap reduction condition introduced in [33]. The 
proof of this lemma can be found in Appendix |8.2.2[ 


Lemma 2 Let {w^} with := {u^,v^, X^) be the sequence generated by (SAMA)-( 161 . If Tk € 
(0,1] and 7fc,/3fc,77fc € 1R-I-+ satisfy the following conditions: 


(1 -h Lj Vfc)7fc+i > 7fe 
^ 2 \ 


Pk+l > (1 - Tk)Pk, 

(1 - '^khk+iPk > 2 |iA|pT^, and 2 ||A||^% = 7 ^+ 1 , 


(17) 


then the following non-monotone gap reduction condition holds: 


< (1 - + 


-fcs, I Vk''~k 7-)2 


(18) 


where is defined by (131 and Df is defined by ( 12 |. 


4.2 Initialization 

We note that we can initialize the algorithm at any starting point := (u^,i)^, A^). However, 
the convergence bounds will depend on G^-^p^ (w^)- In order to provide transparent convergence 
results, we propose to use the following initialization in Lemma whose proof is given in 
Appendix [8.2.1[ 

Lemma 3 Given A° € R™, 71 > 0, and po > 0, let := (u^,w^, A^) be computed by 
' :=argmin {^(m) - (A^A°, u)-|-7i6;^(u, u^)}, 

nGdom( 5 ) 

> :=argmin |/i(«) — (i3^A°, w) + ^|| -|-Hw — c||^|, (19) 

vGdom.{h) ^ 

, A^ := A° — ? 7 o(Au^ -|- Bv^ — c). 


Then, for any /3i > 0, and G-^p defined by (13) satisfy 


r ^ (57i-2r?o||A||2)ryo 


271 


A^-A‘'r-h7o'(A'',A^-A‘'). (20) 


Consequently, if we choose 71 , Pi and po such that 671 > 27 o||A|p and Pi > 
G^,p,{w^)<I^D}+p-^{X°,X^-X^). 
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4.3 Updating the parameters 


For simplicity of presentation, we choose hu such that Lf, = \. For example, we can choose 
hi({-) := (1/2)11 • —for a fixed u‘^ G U. Hence, we can update T^j'yk, Pk and rjk such that the 
equality in the conditions (171 holds. The following lemma provides one possibility to update 
these parameters whose proof is given in Appendix | 8 . 2. 3[ 


Lemma 4 Let bjj be chosen such that = 1 and 71 > 0. Then, for k > 1, if Tk,^kj Pkt o.'fid rjk 
are updated by 


Tfc : = 


k + r 


7fc := 


571 
fc + 4 


Pk ■ — 


18||A|p(fc + 5) 
57i(fc + l){fe + 7)’ 


and rjk '■ = 


571 


2||A||2(fc + 5)’ 


( 21 ) 


then they satisfy conditions (17). Moreover, the convergence rate 0 /{r^.} is optimal, and Pk < 
18||A||" 

57i(fe+l) • 


Let us comment here on our weighting strategy and its relation to m, which places em¬ 
phasis on the later iterates in averaging by using = i -|- 1 as described by (44) in Section 7. 


In our updates, we consider another weighting scheme (44) that places even more emphasis. 


For this purpose, we use toi = {i + 1)(* + 2) and rewrite (44) in a way to mimic the averaging 


step in (|16|): = k^x^ + . Hence, our particular primal weighting scheme (|SAMA| 


uses Tk = 


4.4 The new smoothing AMA algorithm 

Since in the first line of (16) requires on matrix-vector multiplication {Au,Bv), we can 


combine the third line of SAMA and the second line of (16) to compute Aj recursively as 


Afc+i := Pk^k [(1 - rk)Pk>Pk + Tkrjk - A'")] ■ 


( 22 ) 


Then, our algorithm only requires one matrix-vector multiplication (Au, Bv) and one adjoint 
operation {A^X,B"^X) per iteration. Hence, the arithmetic cost per-iteration of (SAMA) and 
the standard AMA 


are essentially the same. We can then combine the main steps (SAMA), 
(16), ( 22 ), and the update rule 1,0 complete the smoothing alternating minimization algo¬ 
rithm (SAMA) in Algorithm 

We can view Algorithm as a primal-dual method, where we apply Nesterov’s acceler¬ 
ated method to the smoothed dual problem while using a weighted averaging scheme x^ = 
( 12^=0 ^i) ^ '22i=o primal variables. However, Algorithm aims at solving the 

nonsmooth problem (§ without any additional assumption on g and h except for the finite¬ 
ness of D/ in ([T^. 


4.5 Convergence analysis 

We prove in Appendix |8.2.4| the convergence and the worst-case iteration-complexity of Algo¬ 
rithm [T] in Theorem [TJ 
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Algorithm 1 [Smoothing Alternating Minimization Algorithm (SAMAl) 


Initialization: 

1 : Fix Uc e dom(p). Choose A° e 
2 : Compute:= prox -i (mc — 


and 71 > 0. Set rjo ■= 2 pjp 


27||A||^ 

2O71 


3: Solve 11 ^ := argmin | h[v) — {A°, Bv) + 4-1| + Bv — c|c|. 

V 2 ^ 

4: Update A^ := A° — rio[Au^ + Bv^ — c) and Ai := /3f^(c — Au^ — Bv^). 


Iteration: For A: = 1 to fcmax, perform: 

5 : Compute Tj. := 7^^! := := 5^J(fc_fi)(fc_|_7) and 17^. := 2\\A\\^{k+b) 

6: Set A'^-:=(l-rfc)A'=+rfcA^ 

7: Compute := prox -1 [uc — A^■ 

8 : Solve := argmin {h[v) — (X^,Bv) + + Bv — c||^|. 

9: Update := A^ — rik[Au^'^^ + — c). 

10 : Compute Afc+i := [(1 - Tk)i3kX*k + ^(A'"+^ - A'")] . 

11 : Update := (1 — Tk)u^ + and := (1 — Tk)v^ + ■ 

End for 


Theorem 1 Let {ui^} be the seguence generated by Algorithm^ Then, for any 71 > 0, the following 
estimates hold 


f{xn - r 


< 571 

- (fc+4) 


II R,7_cII < ■=‘°ll^ll 11-^ II I oil^ll 

||AM +BV C|| S 57i(fc+l) + (fe+1) 


36||A||"||A* 


+ 


3||A|| 


9Dj 

8||A|P(fc+3) 


+ 


9D) 

8||A|P(fe+7)’ 


d[\^) - d* 


, 36||A||"||A*|| 
— 571 (fe+ 1 ) 


- + 


6||A||||A* 

(fe+1) 


-+ 


9Dj 

8||A|P(fc+7) 


571 

(fe+iy 


2 "'“s||A|P(fe+3) 


(23) 


where Df are defined by (12l. As a consequence, if we choose 71 := ||A||, then the worst-case 
iteration-complexity of Algorithm^to achieve an e-primal solution of <§ in the sense of Defi¬ 
nition^ is O (e~^) • 


Theorem shows that the convergence rate of Algorithm [^consists of two parts. While the 
first part depends on \\u'^ — u*\\'^ which is only 0[l/k), the second part depending on Df is up 
to 0[l/k^). We can obtain the convergence rate of the feasibility gap ||Am^ + Bv^ — c|| from 
the dual convergence as done in m- However, this rate is only 0{l/Vk) when the rate on the 
dual d{X^) - d* is C>(l/ifc). 


4.6 Special case: g is strongly convex 

We now consider a special case of the constrained problem <§ when g is strongly convex. If g 
is strongly convex with the convexity parameter pg > 0, then we can modify Algorithm so 
that d(A^) — d* < 0[^) in terms of the dual objective function as shown in [15]. However, the 
convergence rate in terms of the primal objective residual f[x^) — f* and the primal feasibility 
gap + Bv^ — c\\ we can prove is worse than 0{^). 
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Let US consider again the dual function 95 defined by Since g is strongly convex with 
the strong convexity parameter pg > 0, Vip is Lipschitz continuous with the Lipschitz constant 
Lip := We modify Algorithmin order to obtain a new variant that captures the strong 

convexity of g and removes the smoothness parameter 7 ^.. By a similar analysis as in Lemma 
we can show in Appendix |8.3| that if the following conditions hold 



/3fc+i>(l 7-fe)/3fc and ^(2 

(24) 

then 

(«)"+') < (1 - rk)Gp,{w’^) + 

(25) 

where Gp^{'w^) 

:= fp^{x^) + d(X^). The first iterate in (|19|) can be computed as 



w^:=argmin {(;(«) +(A°, Aw)}. 

uGdom{f) 

(26) 


Using ( |26[ ) and new update rules for the parameters in Algorithm we obtain a new variant 
of Algorithm The following corollary shows the convergence of this variant, whose proof is 
also moved to Appendix |8.3| 


Corollary 1 


Let {w^} be the sequence generated by Algorithm^using (26) and the update rules 


Tk ■ = 


e (0,1), Vk ■= 




fc + 4 

Then, the following estimates hold 


2 ||AP 


and Pk '■ = 




18||Af 


h-g{k+^){k + l)' 


{f{x^) - r < 

I WAu'^ + - c\\ < 


16||A||2(fe+3) 

36||A||"||A*|| 9Df 

fj.g{k+l){k+7) 2^(fe + l)(fc+3)(/c+7) 



Alternatively, if we use the following update rules in Algorithm^ 


3 

k + A 


£ (0i 1)1 Vk 


V-gXk 

PP’ 


and Pk • = 


mfrk 

3Ms(1 - T~k) 


2 PP 

flg{k+l)’ 


then 


\\Au^ + - c\ 


^ 27ngD} 

- 4||A||2(fc+3) = 

, 4||A||=^||A*|| 

— Msife+l) 


, 3V3 Dj 

(A;+3) pk-\-l 



(27) 


(28) 


(29) 


(30) 


Here, Dj is defined by (12). In both cases, the guarantee of the primal-dual gap function G{w^) : = 
f{x^) +d{y^) is 




(31) 


where Pk is given by either (27) or (29). 


We note that, similar to [15], if we modify Stepj^of Algorithmj^by := + (A^+^ — 

A*), then we can prove the 0(-^)-convergence rate for the dual objective residual d(A^) — d* 
in Algorithm under the strong convexity of g. 
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4.7 Composite convex minimization involving a linear operator 

A common composite convex minimization formulation in image processing and machine learn¬ 
ing is the following problem: 


min {/(m) := g{u) + h{Fu - y)} , 

x£WP 


(32) 


where g and h are two proper, closed and convex functions (possibly nonsmooth), F is a linear 
operator from to M", and y £ R" is a given observed vector. We are more interested in the 
case that g and h are nonsmooth but are equipped with a tractable proximal operator. For 
example, g and h are both the ^i-norm. 

Classical AMA and ADMM methods can solve (32) but do not have an 0{l/k) - theoret¬ 
ical convergence rate guarantee without additional smoothness-type or strong convexity-type 
assumption on g and h. In addition, the ADMM still requires to solve the subproblem at the 


second line of (43) iteratively when F is not orthogonal. 

If we introduce a new variable v := Fu—y, then we can reformulate ( |32[ ) into with A = F 
and B = —I. In this case, we can apply both Algorithm[^and Algorithm[^(in Section]^ to solve 
the resulting problem without additional assumption on g and h except for the boundedness 
of Df. However, we only focus on Algorithm which only requires the proximal operator of g 
and h. The main step of this algorithmic variant can be written explicitly as 


:=prox^-i^^«c-7fc+l 




'=+1 := prox^-i^ (77-1 a'= -y + Fu'^+^). 


Using this step into Algorithm we obtain a new variant for solving (32 I using only the 
proximal operator of g and h. 

5 The new smoothing ADMM method 

For completeness, we present a new alternating direction method of multipliers (ADMM) 
algorithm for solving (§ by applying Douglas-Rachford splitting method to the smoothed 
dual problem. Our new algorithm, dubbed the smoothing ADMM (SAD-MM), features similar 


optimal convergence rate guarantees as ISAMA) See Section 7 for further discussion. 
5.1 The main steps of the smoothing ADMM method 

scheme is as follows. Given A^ G ) 


The main step of our 


SAD-MM 


parameters 7 fc_|_i > 0, > 0 and pk > 0, we compute (m*+^ , , A^“*"^) as follows: 


v'^ £ dom (h) and the 


:= argmin 

uGdom(p) 


11 


-(A-‘ , u) + Y IIA-U+Bv^- cll^ +'yk+ibu(u, u 


)}■ 


:= arg min ih(v) — {b'^\^,v) -\- + Bv — c"^ 

v£doni{h) L 2 

A^'+i ■= -Pf. -h Bv’"+^ - c). 


}■ 


(SAD-MM) 


This scheme is different from the standard ADMM scheme ( |43[ ) at two points. First, is 
computed from the regularized subproblem with g{-) instead of g{-). Second, we 

use different penalty parameters pk and rjk compared to the standard ADMM scheme (43) in 
Section]^ The complexity of computing in (SAD-MM) is essentially the same as for the 
first subproblem of computing in (43). 


As a special case, if A = I, the identity operator, or A is orthonormal, the we can choose 
= (1/2)11 ■ —to obtain a closed form solution of as 

:= prox(^^_^.^^^^)-ig (j^pk + 7/c+i)"^ {ik+iu’' + A^(A'= - Pk{Bv'‘ - c 
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In addition to (SAD-MMI, our algorithm also requires additional steps 




■— (1 “ 


(dual acceleration) 


11 ^''“^) := (1 — Tf^){u^, (weighted averaging) 


(33) 


as in Algorithm[lj where := — Au^ — Bv^), and rj, € (0,1) is a step size. 

We prove in Appendix |8.4.1| the following lemma, which provides conditions on the param¬ 
eters to guarantee the gap reduction condition. 


Lemma 5 Let with := {u^,v^, be the sequence generated by (SAD-MM l-( 33). If 

Tfc G (0,1) and 'yk, h, Pk,Vk & K++ satisfy 


(1 - Tfc)(l + ‘2Tk)rjkPk > 2 r|, 


7fc+i > 


3 2rfe 




0k+l 


> (1 - Tk)Pk, and jk+i > 




(34) 


then the following non-monotone gap reduction condition holds 




(35) 


where defined by (131, and Df is defined by (12l. 


5.2 Updating parameters 

The second step of our algorithmic design is to derive an update rule for the parameters to 
satisfy the conditions (|34|). Lemma[ 6 |shows one possibility to update these parameters, whose 
proof is given in Appendix | 8 . 4. 2[ 

Lemma 6 Let big be chosen sueh that !/{, = ! and 71 > 0. Then, for k > 1, the parameters Tk, 'Jk: 
Pk, Pk and rjk updated by 


Tk ■ = 


fc+4’ 


-n. — ^ 
Ik ■— fe_|_2’ 


a _ 6||A||^(fc+3) 

Pk ■ 7i(fc+l)(fe + 10)’ 


_ 971 _ 371 

2||AP(/c+3)(fe+4) ’ •“ 2||yip(fc+3)’ 


(36) 


satisfy (34). Moreover, Pk < convergence rate of {rk} is optimal. 


We note that we have freedom to choose 71 in oder to trade off the primal objective residual 
f{x^) — f* and the primal feasibility gap \\Au^ + Bv^ — c|| as in Algorithm 

5.3 The smoothing ADMM algorithm 

Similar to Algorithm we can combine the third line of (SAD-MM I and the second line of 
(33) to update A^. In this case, the arithmetic cost-per-iteration of Algorithm^ is essentially 
the same as in the standard ADMM scheme (43). We also use = (12^, A^) computed by 


(19) at the first iteration. By putting (19), (36), (SAD-MM), (33) and (22) together, we obtain 


a complete |SAD-MM| algorithm as presented in Algorithm 
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Algorithm 2 [Smoothing Alternating Direction Method-of-Multipliers (SAD-MM)) 

Initialization: 

1 : Fix Uc e dom(p). Choose A° e R" and 71 > 0. Set rjo := 2 \\Ap 
2 : Compute := prox^-i^(Mc — 

3: Solve := argmin | h[v) — (A°, Bv) + ^ \\Au^ + Bv — c||^|. Set := . 

4: Update A^ := A° — r]o(Au^ + Bv^ — c) and Ai := — Au^ — Bv^). 

Iteration: For A: = 1 to fcmax, perform: 

5 : Compute Tfc := 7^+1 / 3 fe := m) ■ Then, set % := 2\\Ap{k+3) 

Pk — 2||A|F(fc+3)(/c+4)- 

6 : Set A^ := (1 — 

7: Solve := argmin{g{ti) — (A^, Au) + ^||AM+_Bt}^—c||^ + 7 frfi^w(W)'u'^)}- 
8 : Solve := argmin {h[v) — ,Bv) + -A Bv — c||^|. 

i; *■ z 

9: Update := A^ — r]k[Au^'^^ + — c). 

10 : Compute Aj_^i := [(1 - Tk)PkK + • 

11 : Update := (1 — Tk)u^ + and := (1 — Tk)v^ + 

End for 


5.4 Convergence analysis 

The following theorem with its proof being in Appendix |8.4.3| shows the worst-case iteration- 
complexity of Algorithm 

Theorem 2 Let {(u^, A^)} be the sequence generated by Algorithm^ Then the following esti¬ 

mates hold 


- r 


< 371 

- (fe+2) 


\\u —u 

2 


*i|2 27D^ 

+ 8||A|P(fc+3) 




+ 


6||A|| 

(fc+1) 


+ 


27Dj 

8||A||2{/c-flO)’ 


(37) 


where Df is given by (12). //71 := ||A||, then the worst-case iteration-complexity of Algorithm^^to 
achieve an e - solution xr of § is O (e"^). 


As can be seen from Theorem 2 the term 


B 


6||A|| 


+ 


27D, 


8||A|P(fc+10)) 


1/2 


in (37) does 


not depend on the choice of 71 . If we decrease 71 , the the upper bound of f[x^) — f* decreases, 
while the upper bound of ||Au^ -\- Bv^ — c|| increases, and vice versa. Hence, 71 trades off these 
worse case bounds. The convergence rate guarantee on the dual objective residual can be easily 
obtained from the last bound of (15). 

5.5 SAMA vs. SADMM 

There are at least two cases, where SAMA theoretically gains advantages over SADMM. 
First, if A is non-orthogonal. In this case, the u-subproblem in (SAMA) can be computed 
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by using prox^, while in SADMM, the nonorthogonal operator A prevents us from using 
proxg. Second, if g is block separable, i.e., g{u) := X]i=i then we can choose g^{u) := 

[ft (ui) + ^ ||ui — wf 11^], which can be evaluated in parallel, while does not scarify the 
results of Theorem [33] . This is not preserved in SADMM. 

6 Numerical evidence 

We illustrate a geometric invariant property of Algorithm and Algorithm for solving the 
distance minimization problem 0 . This problem is classical but solving it efficiently remains an 
interesting research topic. Various algorithms have been proposed including Douglas-Rachford 
(DR) splitting, Dykstra’s projection, and Hauzageau’s method as mentioned previously [Hll]. 
In this section, we compare our algorithms with these methods. 

We revisit the problem 0. which has a key application in convex optimization: 

Find z* such that: z* € Ci nC 2 , (38) 


where Ci and C 2 are two nonempty, closed and convex sets in If we assume that Ci nC 2 ^ 0 
such that (381 has solution, then we know that the optimal value of 0 is zero. Moreover, our 
primal template 0 for 0 then takes the following form 


min {sc (m) + sc^{v) : u + v = 0,u G Bi,-;; G Bi} , 

U,V 


(39) 


where sc^ is the support function of Ci for i = 1, 2 and Br- := {w : ||w|| < r} for r > 0. 

Clearly, (391 is fully nonsmooth, since both sc^ is convex and nonsmooth. Here, we can even 
increase the constraint radius, currently 1, to a sufficient large number such that the constraints 
u, u e Br of each subproblems in (43), (SAMA) and (SAD-MM) are inactive without changing 
the underlying problem. In this particular setting, we can choose the center points for u and v 
as zero since they actually obtain the optimal solution. 

If we apply ADMM to solve (39), then it can be written explicitly as 

' := proxg-,,^^ (A'^ - u'^) = A^' - - p-^c, (p(A'= - u'^)) 


:= proxg-i,^^ 

_ A^+i := a'= - (u'^+i 


(A^ - u'=+i) =X'^- 


k+i 


P ^ttc. 


„k+l 


), 


(p(^ 


fc_yfc+iA 




where ttc^ is the projection onto Ci for i = 1,2, and p > 0 is the penalty parameter. Clearly, 
multiplying this expression by p and using the same notation, we obtain 


:= A — V — ttq 


/c+l_ 


. \k „ /c+l rr- f\ 
X — U — TTC^ (A 


.(a‘-.‘). 




(40) 




which shows that this scheme is independent of any parameter p. With an elementary trans¬ 
formation, we can write (40) as a Douglas-Rachford (DR) splitting scheme 








To recover and from z^ and A*^, we can use := A^“^ — z^ and := z^~^ — A^. 


( 41 ) 
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Now, if we apply our SAMA to solve (39) using bn{u,u‘^) := (l/2)||tt — the two main 
steps of SAMA becomes 


'fc+t- 


“ + T'/c+l^^) - yk+l^'' + - 7fc+\7rCi (a'" + 7/c+l^i‘') 


prox -1 (ri, 
^ 'Ik SC2 ^ 




-k+1 


- VkU' 


k-H 


(42) 


Clearly, the standard AMA is not applicable to solve (391 due to the lack of strong convexity. 


The standard ADMM applying to (39) becomes the alternative projection scheme (41) for 


solving (38). This scheme can be arbitrarily slow if the geometry between two sets Ci and C 2 


is ill-posed. 

To observe an interesting convergence behavior, we test the Dykstra projection, the Hauza- 
geau method, and the ADMM (40) (or its DR form (|41[)), and compare them with our algo¬ 
rithms on the following configuration. 

We first choose Ci := {u G : (a^, u) < bi} for i = 1, 2 as two half-planes in M", where 61 = 
1)2 = 0. Here, the normal vectors are ai := (e, ■ • • , e, —1, • ■ • , —1)^, and a 2 := (0, • • ■ , 0,1, • ■ • , 1)^, 
where e > 0 is a positive angle. The tangent angle e is repeated [n/2J times in ai, and the zero 
is repeated [n/2J times in a 2 , where n = 1000. The starting point is chosen as := (1, • • ■ , 1)^. 
By varying e, we can observe the convergence behavior of these hve methods. 

We note that the Dykstra and Hauzageau algorithms solve directly the dual problem 
while our methods and ADMM solve both the primal and dual problems ( |39[) and 0 . We 
compare these algorithms on the absolute dual objective residual d(A) — d* of (llj. 

Figure]^ shows the convergence of hve algorithms with different choices of e on the absolute 
objective residual d(A) — d* of 0. 

We observe that Hauzageau’s and Dykstra’s methods are slow, but Hauzageau’s method 
is extremely slow. The speed of ADMM (or DR splitting) strongly depends on the geometry 
of the sets, in particular, the tangent angle between two sets. For large values of e, these 
methods work well, but they become arbitrarily slow when e is decreasing. The objective value 
of this method drops quickly to a certain level and then is saturated, and makes a very slow 
progress toward to the optimal value as seen in Figure Since the ADMM scheme (40) is 
independent of its penalty parameter, this is the best performance we can achieve for solving 
0 . Both SAMA and SADMM have almost identical convergence rate for different values of e. 
These convergence rate reflects the theoretical guarantee, which is 0{l/k) as predicted by our 
theoretical results. 


7 Discussion 

We have developed a rigorous alternating direction optimization framework for solving con¬ 
strained convex optimization problems. Our approach is built upon the model-based gap reduc¬ 
tion (MGR) technique in [32] . and unifies hve main ideas: smoothing, gap reduction, alternating 
direction, acceleration/averaging, and homotopy. By splitting the gap, we have developed two 
new smooth alternating optimization algorithms: |SAMA| and fSAD-Ml^ with rigorous conver¬ 
gence guarantees. One important feature of these methods is a heuristic-free parameter update, 
which has not been proved yet in the literature for AMA and ADMM as we discuss below: 

Alternating direction method of multipliers (ADMM)/ The ADMM algorithm can be viewed as 
the Douglas-Rachford splitting applied to the dual problem of 0 . As a result, the standard 
ADMM algorithm generates a primal-dual sequence as 
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^iterations ^iterations 


Fig. 1 The convergence behavior of five algorithms with different values of e. These plots correspond to 
e = IQ-l, 10 - 2 ,10-3 or 10-"^ 


:= arg min {(/{«) — {X^,Au} + ^\\Au + Bv^ — cllil 

uGdom(g) 2 

‘ := arg min (h{v) — {X^, Bv) + + Bv — c\\ 2 \ ('^3) 

i;Gdoni(/l) 12 J 

■=X^ -rtk [Au^+^ + Bv^+^ - c ), 

where k denotes the iteration count and ? 7 fc > 0 is a penalty parameter. This basic method 
is closely related to or equivalent to many other algorithms, such as Spingarn’s method of 
partial inverses, Dykstra’s alternating projections, Bregman iterative algorithms, and can also 
be motivated from the augmented Lagrangian perspective [5]. 

The ADMM algorithm serves as a good general-purpose tool for optimization problems 
arising in the analysis and processing of modern massive datasets. Indeed, its implementations 
have received a significant amount of engineering effort both in research and in industry. As a 
result, its global convergence rate characterizations for the template § is an active research 
topic: c/., |8l[9l[T0llTTllT3llT^IT71IT8llMll29ll^ . 

In the constrained setting of a global convergence characterization specifically means 
the following: The algorithm provides us = {u^,v^) and we determine the number of itera¬ 
tions k necessary to obtain f{x^) — f* < and ||Au^ +Bv^ — c|| < tc for some fixed accuracy ef 
for the objective and for some— possibly another—fixed accuracy Cc for the linear constraint. 
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Separate constraint feasibility is crucial so that the primal convergence has any significance 
otherwise we can trivially have /* — f{x^) < 0 for some infeasible iterate . 

A key theoretical strategy for obtaining global convergence rates for alternating direction 
methods is ergodic averaging [i[9l[iniinilIl|2l|Ml|29l|35] . For instance, as opposed to working 
with the primal-sequence x^ := from (43) directly, we instead choose a sequence of 

weights {oJk} ^ ( 0 ,-|-oo) and then average as follows 


-k 

X 


2=0 


k 

'^UJiX^ 

2=0 


(44) 


The averaged sequence x^ then make it theoretically elementary to obtain the desired type of 
convergence rate characterizations for <§• 

Indeed, existing literature critically rely on such weighting strategies in order to obtain 
global convergence guarantees. For instance. He and Yuan in m prove an C>(l/fc)-convergence 
rate of their ADMM scheme (43) by using the form (|44[) with oJi := I but for both primal and 


dual variables x as well as A simultaneously. They provide their guarantee in terms of a gap 
function for an associated variational inequality for ([^ and assume the boundedness on both 
primal and dual domains. This result is further extended by other authors to different variants 
of ADMM, including |16II311I3^ . The same rate is obtained in [10] for a relaxed ADMM variant 
with similar assumptions along with a weighting strategy that emphasizes the latter iterations 


by using := fc -|- 1 in (44). 


We should note that there are also weighted global convergence characterizations for ADMM, 
such as f{x^) — f* + p\\Au^ -|- Bv^ — c|| for some fixed p > 0, such as Shefi and Teboulle [29] . 
The authors added proximal terms to the u- and w-subproblems and imposed conditions on 
three parameters to achieve the C>(l/A:)-convergence rate jointly between the objective residual 
and feasibility gap. Intriguingly, this type of convergence rate guarantee does not necessarily 
imply the (!I(l/fc)-convergence separately on the primal objective residual and feasibility gap 
as indicated in [291 Theorem 5.2] without additional assumptions. We ague that, for general 
non-smooth convex problems, like 0 : without any acceleration step, it may not be possible to 
achieve the best rate as ours with only averaging scheme for both the primal and dual problems 
as can be seen from the literature. 

Interesingly, making additional assumptions on the template is quite common dQKUdi 
115] . For instance, the authors in [26] studied a linearized ADMM variant of (43) and proved 


the 0{l/k)-vdXe separately, but required the Lipschitz gradient assumption on either g or hvci 
0 . In addition, the authors in m require strong convexity on both g and h. In contrast, the 
authors m require the strong convexity of either g or h but need A or B to be full rank as 
well. In [36] the authors proposed an asynchronous ADMM and shown the 0{l/k) rate on the 
averaging sequence for a special case of 0 where h = 0, which trivially has Lipschitz gradient. 

Unsurprisingly, these assumptions again limit the applicability of the algorithmic guarantees 
when for instance g and h are non-Lipschitz loss functions or fully non-smooth regularizers, 
as in Poisson imaging, robust principal component analysis (RPCA), and graphical model 
learning |7|. Several recent results rely on other type of assumptions such as error bounds, 
metric regularity, or the well-known Kurdyka-Lojasiewicz condition [6l l201fT^ . Although these 
conditions cover a wide range of application models, it is unfortunately very hard to verify some 
quantities related to these assumptions in practice. Other times, the additional assumptions 
obviate the ADMM choice as they can allow application of a simpler algorithm: 


Alternating minimization algorithm (AMA); The AMA algorithm, given below, is guaranteed 
to converge when g is strongly convex or g* has Lipschitz gradient m- 
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u^+^:=argmin [g(u) — {x'^ , Au)\, 

u^doTn{g) 

:= arg min \h(v) — {x'^ , Bv) + + Bv — c\\ 2 ], (45) 

vGdom{h) ^ 

:= X^ - - c), 


where r]k > 0 is the penalty parameter. 

One can view AMA as the forward-backward splitting algorithm applied to the dual prob¬ 
lem if (c/., |15ll34p . Alternatively, we can motivate the algorithm by using one Lagrange dual 
step and one augmented Lagrangian dual step between two groups of variable u and v HIMl 
I34| . Computationally, (451 is arguably easier than ( |43[ ). However, it often requires stronger 
assumptions than ADMM to guarantee convergence [151 m- The most obvious assumption is 
the strong convexity of g. 
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8 The proof of technical results 

This appendix provides the full proof of technical results presented in the main text. 

8.1 The proof of LemmaThe primal-dual bounds 

First, using the fact that —d(A) < —d* = f* < C{x, A*) = f{x) -t- (A*, Au + Bv — c) < f{x) -\- 
II A* IIII Au -|- Bv — c||, we get 


- ll•^*IIP“ + -B^’-c|| <f{x)-f* < f{x) + d{X), (46) 

which is exactly the lower bound 

Next, since A^X* £ dg{u*) due to by Fenchel-Young’s inequality, we have g{u*) -\- 
g*{A^X*) = {A^X*,u*), which implies g*(A'^X*) = {A^X*,u*) — g{u*). Using this relation and 
the definition of </9y, we have 

(Py(A) := max|(A'^A,u) - g{u) - 76;^(m,m^)| > {A^X,v*) - g{v*) - ^bu{u*,1^) 

= {A^X*,u*) - g{u*) A {A^{X - X*),u*) - ^hu{u* 

= + { 7 l^(A- A*),u*} -7&;7 (u*,u“) 

= V 5 (A*) + (A - A*, Au*} - ')hu{u*^u"). 


Alternatively, we have V’(A) > 4>{A*) + {Vip{X*), X — A*), where VV'(A*) = B\/h*{B'^X*) — c = 
Bv* — c due to the last relation in §, where V/i*(B^A*) G dh*{B"^X*) is one subgradient of 
dh*. Hence, '!/'(A) > i/’(A*)-|-(A —A*,Bii* —c). Adding this inequality to the last estimation with 
the fact that d^ = (pj + 'tp and d = p + tp, we obtain 


djiX) > d(A*) + (A — A*, Au* -|- Bv* — c) — ybu{u*, u^) d* — ■ybij(u*, u^) 


(47) 


Using this inequality with d* = —f* and the definition ( |13[ ) of fp we have 

ITsI+CtI 1 

f{x)-f* < fp{x) + d^{X)+''ibu{u*,u^)-—\\Au + Bv-c\\ 

= G~fp{w) +^bu{u*,u') - ^\\Au + Bv-c\f. 


2 


(48) 
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Let S := Gry 0 {w) + "fbu{u*, u‘^). Then, combining (46 1 and (48 1 , we obtain the first inequality 
of 

Let t := \\Au + Bv — c\\. Using again (46) and (48), we can see that — ||A*||t — S < 
0. Solving this quadratic inequation w.r.t. t and noting that t > 0, we obtain the second 
bound of (15). The last estimate of (15) is a direct consequence of (48), the first one of (15). 
Finally, from (46), we have f{x) > f* — ||A*||||Au + Bv — c||. Substituting this into (48) we get 
d{A) — d* — IIA*IIII+ Bv — c|| < S — -^\\Au + Bv — c|p, which implies 

d{X)-d* < S - {l/{2P))\\Au + Bv - cf + \\X*\\\\Au + Bv-c\\. 

This indeed leads to the last inequality of ( |15[ ). □ 

8.2 Convergence analysis of Algorithm 

We provide a full proof of Lemmas and Theorems related to the convergence of Algorithm 
First, we prove the following key lemma, which will be used to prove Lemma|^ 

Lemma 7 Let A^"^^ be generated by (jSAMA). Then 

< (1 - rfe)d^;^,(A'=)+rfcl^^,(A) + i(A'=+l-A^ (l-rfe)A'=+rfeA-A'=) 


u 


/c +1 II 2 


(49) 


where 


+ {VvP7^,(P), A - A'^) + V>(A) 


^ d'jhi-i 


{X)-^-f^\\u*{A^X)- 


(50) 


(51) 


In addition, for any z, ')kyTk+i > 0> ^^6 function defined by ( |10| ) satisfies 

S7rs.i {z) < 9*k (^) + (7fc - 7fc+l)^(wWi ( 2 )- w"")- 

Proof First, it is well-known that |SAMA| is equivalent to the proximal-gradient step applying 
to the smoothed dual problem 

mm {(P7fc+i(A) -h V’(A) : A G R”} . 

This proximal-gradient step can be presented as 

:= prox^^^ (X^ - (A'')^ . 

We write down the optimality condition of this corresponding minimization problem of this 
step as 

0 G 5^(a'=+1) + (a'^) + - X'^). 

Using this condition and the convexity of if, for any \/tp(X^'^^) G dtp{X^'^^), we have 

^(a'=+^) < '0(A) -h (V0 (a'"+^), a''+^ - A) 

= V'(A) + (a'^), a - a'^+I) + - A^ A - a'^+I). (52) 

Next, by the definition <p(A) := g*{A^X), we can show from (10) that (A^A^). 

Since g* is (l/ 7 )-Lipschitz gradient continuous, we have 

|||V57(^) - V 57 ( 2 )f < g*{z)-gj{z) - {Vgj{z),z- z) < ^||2 - zf. 
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Using this inequality with 7 := 7 ^+ 1 , A) = A), A'') 


u^+^, and Vv57fc+i(A) = A\7g^^^^{A^X), we have 

HI (A) 


‘^7fc+i V^V ~ '' y7fc+ 

(A^A) - £'=+1 f < (A) - (A'^) - (A'^), A - a7 

< 1 HATf^ \fcM|2 ^ ||A||U|\ \fe||2 


(53) 


Using ([ 5 I with A = A'=+\ we have 


< 957-=+! (a'^) + (Vvs^^^a'^), a'^+1 - a'=) + - A'^f • 


Summing up this inequality and (521 and using the definition of ^* 7^+1 (A) in (50) and d(-) in 
(54), we obtain 


d7.+H (a'=+') < ^7.+! (A) + ^ (a"+' - A^ a - a'^) - ( ^ - ^ 1 IIA"+^ - A 

Vk V Vk 27fc+i J 


iia''+^ -. 


fc ||2 


(54) 


Here, the second inequality in (50) follows from the right-hand side of (531. 


Now, using (|54[) with A := X^ and then combining with (50), we get 


rf7^H (a"+ 7 < rf7^H (A'^) + 4{A'=+' - a7 a'^ - A'^) - (i - 1^) IIA'^+1 - A 


fc ||2 


Multiplying the last inequality by 1 — r/t G [0,1] and (54) by r/^ G [0,1], then summing up the 
results, we obtain (49). 


Finally, from (10), since gi/{z) := maxu { 77 ( 11 , 7 ; 2 ) ;= (z,u) — g{u) — 767.7 («; m'^)}, is the max¬ 
imization of g{-) over u indexing in 7 and z, which is concave u and linear in 7 , we have g*{z) is 

convex w.r.t. 7 > 0. Moreover, = —bij(uj(z),u‘^). Hence, using the convexity of g* w.r.t. 

7 > 0 , we have 57('*) ^ 5 ^ («) “ (ifc - lk+i)bu{u'^iz),u‘'), which is indeed ([sT). 


8.2.1 The proof of Lemma^ Bound on Gryp for the first iteration 

Since := ('u^, 7 }^,A^) is updated by (19), similar to (SAMA), we can use (54) with fc = 1, 
A := A° and (A°) < d-yi (A°) to obtain 


d,7A') < rf7H(A°) - ( ^ - ^ ) IIA^ - A°"^ 


Since solves the second problem in (191 and ^ dom(/i), we have 


h{v*{X°)) -{X°,Bv*{X°)) + ^\\Au^ +Bv*iX°)-cf > h{v^) 
-{X°,Bh^) + 'l^WAu^ + Bv^ - cf + 'f\\B{v*Cx°) 


Using Df in (12), this inequality implies 


(55) 


V(h'^A°) < (X°,Bv^) - h(v^) - ^IIAu^Bw^-cf + ^HAh^+Bv^-cjlDf. (56) 
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Using the definition of d^, we further estimate (551 using (56) as follows: 
dm (A') (A°) + V'(A°) - || A^ - A° 


\ 0||2 




l|A^-A°|| 


Vo 271 

^ {X°,Au^ + Bv^ — c) — g{u^) — h{v^) — ')\hu{u^ ,xf) 


< + ^ + I|A'-A°f + 4(A°,P - A°) + ^D}. 


Since {w^) = {x^)+d^^ (A^), we obtain (20 1 from the last inequality. If /3i > rio(5'yi-2\\A\\^rjo) - 

then (20) leads to + ^{A°, A^ — A°). □ 

8.2.2 The proof of Lemma i- Gap reduction condition 

For notational simplicity, we first define the following abbreviations 




fc+i 

•* 

'fe+i 


Dh 


= Am'' + Bm'' - c 
= Am''+^+B£^+^-c 

= the solution of ( 10 ) at A* 


= m*(A^) £ dh*{A^\^) a subgradient of h* defined by (|^ at A^A*, and 


= ||Am''+i + B{2vl - t)''+i) - c||. 


From 


SAMA we have A^"*"^ — A^ = rjf^{c — Am^“*"^ — Bii^'^^) = —rn^z^'^^. In addition, by (16), we 
have A^ = (1 — rj.)A^ + ''■feA^, which leads to (1 — rj.)A^ + r/jA^ — A* = Tfc(A^ — A^). Using these 
expressions into ( |49[ ) with A := A^, and then using ( |50[ ) with ^ 7 fc+i(A^) < dj^_^_j^{X^), we obtain 

d7rm {A"+') < (1 - rk)dy.^, (A^') + {X^) - rk{z'^+\Xl - A^') 

-Vk (1 - - (1 - r,y-^\\ul^, - y+Y- 


By (511 with the fact that ip-y{X) := gf/{A^X), for any 'yk+i > 0 and 7 ^ > 0, we have 

¥57m-i(A^) < ‘P7fc(A^) + (7fc - 7fe+l)fcw(“pl,Mc). 

Using this inequality and the fact that dj{-) := V 57 (-) -j- ip{■), we have 

d7M-i(A^) ^ d.Jky'") + (7fc - lk+i)buiu*k+i,Uc). 

Next, using from 


(58) 


SAMA 


and its optimality condition, we can show that 


h^B^Xy -nt\\Ay+^+Bvl-cf = {B^X\vl)-h{vl) - f ||Am''+i + Bm^ - cf 
< (B'^A^^i''+l) - h{y+^) - ’^||Am''+i+Bm''+1-cP - ^||B(m* - 

Since V>(A) := h*{B^X) — cFX, this inequality leads to 

yxy < (B^A'',m''+1)-(c,P)-/i(m''+1)-^P*'+iP-^(5''+i,Am''+i + B(2m* -m''+P) 
< (A^BM''+l - c) - h{y+^) - f 
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Now, by this estimate, {•) = (■) + V’(') SAMA we can derive 

= -/(5'=+') + (A^5'=+l) - f M pfc+i II TAfc («'=+!, sp. 


Combining this inequality, ( |57[ ) and (58 1 , we obtain 

dyUV^+^) < {l-rk)dy,{p)-Tkf{x^+^)+rk{\l,z>^+^) 


Tk ll^llPfcA ||?fe+l||2 


’7fc(^l+ 2 2yhki , 

- Tklk+ibuiu ^'^^> m'')-( 1 -T-fc)(7fe - 7fc+i(“fc+i> “c) 


(59) 




fc+1 I I 2 _|_ T^T]}^ I I ;7 fc+l 


pfe. 


Now, using the definition Gk, we have 

- AJs'=)+d^jA'=) = /(i'= 

I ^ f\k\ I 1 ||-fc ||2 


Gk{w'") ■= fi3k{^'")+dyk{>^'') = /(a;*") + d^jA'") + ^\\Au'^ + Bv’^ - cp 


= /(^^) + dyk{^'‘) + ^ 11 ^ 

Let us define AG^ := (1 — Tk)Gk{w^) — Gk+iiw^^^)- Then, we can show that 

AGk = (1 - rk)f{x>^) + (1 - rk)dy,{X>^) - f{x’^+^) - d^^pA'^+p 




(60) 


W 


2/3A>fi 


By ( |16[ ), we have = (1 — Tk)z^ + Using this expression and the condition fik+i > 

(1 - Tk)ldk in we can easily show that 


0_p70 ||jfc ||2 _ 
Wk " " 


t ||-fe+l||2^ Xk,^k+1 -fe\ 

^||. II .O- 


-fe+l ||2 


2l3k{l-rk)' 


Substituting this inequality into (601, and using the convexity of /, we further get 
AGfc > (l-rfe)dy,(P) -d^^,(A'=+p-rfe/(i'=+p 


_ Tfc /-K+i -/C\_ 

/ 2 ( 1 - 


(61) 


G-rk)/3k I 


Substituting (591 into (611 and using A^ := -^{c — Au^ — Bv^) = —^ 2 *, we obtain 
Tk l|Ap?7fc 




7 


27 fe+i / 2(1 - rfc)/3fc 


+ Afc - 


( 62 ) 


where 


Rk ■■— 


._ (1 -"^fc) 


7fc+ill“fc+i “ + TkJk+ibui 


'fc-fl -C 


) - (1 - 'rfe)(7fe - 7fe+i)^W(’^fc+i-“'')• 


Furthermore, we have 


%||~fe+l||2 Tfc%|| fc+l| 


Dk = "fiWz^+^W - TkDk]" - 


2 VkTkDl ^ _rikTlDl 


2 


4 


4 
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Using this estimate into (621, we finally get 

PIlV' 




27fc+i 


0 


T-fc 


2(1 - Tk)Pk 




(63) 


Next step, we estimate R^- Let := u\yy — Wc, Sfc := — Uc- Using the smoothness of bn 

we can estimate R^. explicitly as 

27j!c-f-ll^fc (1 '^h) II II (1 (7/e-|-i7fc 1)L;> llo^fcll “fi’^fc||fifcll 

(1 - rfc)afe|p + (1 - Tfe) {rk - - l)L&j l|a/clP- 


(64) 


= a — ( i — ' 


By the condition (1 + L^^r^,) 7 fc_|_i > 7 ^, in (17), we have — ( 7 ^(^\ 7 fc — 1)7/6 > 0- Using this 
condition in (64), we obtain R^. > 0. Finally, by (12) we can show that Dj, < Df. Using this 
inequality, -Rfe > 0 and the second condition of (17), we can show from (62) that AGk > 
_!Zi^£) 2 , implies ( |18[ ). □ 

8.2.3 The proof of Lemma^ Parameter updates 

The tightest update for 7 ^, and /3fc is 7^+1 := and Pk+i ’■= (1 “ ''"fe)/3fc due to (17). Using 


these updates in the third condition in ( |17[ ) leads to . By directly checking 

this condition, we can see that r^, = 0 {l/k) which is the optimal choice. 

Clearly, if we choose tj, := then 0 < r^. < 1 for k > 0 and tq = 3/4. Next, we choose 

7 fc+i := i+rl /3 - T+ 7 ^' Substituting ^ into this formula we have 7^+1 = 7 fc- By 

induction, we obtain 7^+1 = |^. This implies % = 2 \\An^(k+ 5 ) ■ and 7^+1 = §^, 


we choose from the third condition of (17) as /Ij, = for k > 1. 


Using the value of and /3fc, we need to check the second condition /3fc_|_i > (1 — Tk)Pk of 0 - 
Indeed, this condition is equivalent to 2k^ + 28k + 88 > 0, which is true for all fc > 0. From the 


update rule of /l^, it is obvious that . 

8 . 2 . 4 . The proof of Theorem^ Convergence of Algorithm^ 


□ 


4571 


We estimate the term in ( |18[ ) as 

2 ^ _ 4571 _ _ 

^kVk 2||A||2(fe+4)2{fc+5) 2P||2(fc+4)(fe+5) 

Combing this estimate and ( |18[ ) , we get 

4571 


- (l-Tfe) 


4571 


2||A||2(fc+3)(fc + 4)' 


Gfe+i(m'=+i)- 


8||A||2(fc + 4)(fe + 5) 


< (l-'Tfc) 


Gfc(w'') - 


4571 


8P||2(fc + 3)(fc + 4) 


By induction, we have Gfc(w'=)-^ \n,(,7d\ -?7l 


8||A|| 
fe-l / 


qfc+ 3 Kfc+ 4 ) ^ ^k[Gi{w ) - 32 ^< 0 whenever 


Gi(ui^) < 41^(4112 7)/, where u>k ■= 11^=1 (f “ p)- Hence, we finally get 


Gk{w^) < 


4571 

8||A||2(fc + 3)(fc + 4)- 


(65) 
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Since rjo = 2 \\A\\'^ ’ satisfies the condition 571 > 2?7ol!^||^ in Lemma In addition, from 
Lemma 1^ we have P\ = ‘^’^ 20^1 ^ ; which satisfies the second condition in Lemma We 

also note that /3k < . If we take A° = 0 ™, then Lemma shows that < 

~ 4 || 3^||2 -P/ < 32^|'^||2 -D^. Using this estimate and (651 into Lemma [l| we obtain ( pd] ). 
Finally, if we choose 71 := || j 4 || , then we obtain the worst-case iteration-complexity of Algorithm 
[^isC»(£-^). □ 

8.3 The proof of Corollary Strong convexity of g 

First, we show that if condition (241 hold, then (251 holds. Since Vip given by § is Lipschitz 
continuous with L^g := ^||A|p, similar to the proof of Lemmawe have 


> 


Vk 


3 ^ _ Vk\\A\\^ 

4 2 2pg 


r^k 


2(1 - Tk)l3k 


-fc+l||2 _ T~k Vk Jj2 


( 66 ) 


where := (1 — Tk)G^^,{w^) — Gp^^{w^'^^). Under the condition (24), ( 66 ) implies (25). 

The update rule ( |27[ ) is in fact derived from (24). We finally prove the bounds (28). First, 
we consider the product By 


'^k Vk 




we have 
9Mg 


9^3 


2||A||2(fc-F4)2 " 2||A||2(fc-F3)(fc-h4) 4||A||2(fc-H4) 

By induction, it follows from (|25|) and this last expression that: 


- (1 - n) 


9^3 
4||A||2(fc+3y 


^ 16||A||2(fc-F3) ) 64||A||2j 


(67) 


whenever Gp^{w^) < ■ Since is given by (26), with the same argument as the proof 

of Lemma we can show that if ^ ^ then G^j(w^) < However, from 

the update rule (27), we can see that 70 = 2||^||2 s-nd /3i = . Using these quantities, 

we can clearly show that ^ = u^- Moreover, G^j(w^) < 

Hence, (67) holds. Finally, it remains to use Lemmato obtain (28). The second part in (30) 
is proved similarly. The estimate (|31[) is a direct consequence of (671. □ 


8.4 Convergence analysis of Algorithm 

This appendix provides full proof of Lemmas and Theorems related to the convergence of 
Algorithm 

8.4-1 The proof of Lemma^ Gap reduction condition 

We first require the following key lemma to analyze the convergence of our |SAD-MM| scheme, 
whose proof is similar to (54) and we omit the details here. 


Lemma 8 Let be generated by the SAD-MM scheme. Then, for A G one has 


W (A) + ^ {A'=+^ - A", A- A") - ^ II A" - A"+y I" + 


1 


rfcfl 


1 


rfc+l||2 




||A'=-A 


fc+l||2 


Vk' ' ' Vk" " ‘2Vk+i 

where A^ := A^ — Pk{Au^T^ + Bv^ — c) and Ay(A) := (p-y(A^) -|- A — A^) -|- i>{X). 
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Now, we can prove Lemma We still use the same notations as in the proof of Lemma 
2 In addition, let us denote by := and := given in (11), 

F := Au^+^ + Bv^ -c and := + Bv'^ - c||. 

First, since + {'V(p'y{X^), X — A^) < ip 7 (A), it follows from Lemma|^that 


) < dhM-r (A) + - (a'^+i - A^ A- a'^) - - II a'^ - a'^+i f +1^ 
Vk Vk 27fc+i 


||P_Afc+i|| 2 ^ ( 68 ) 


Next, using [241 Theorem 2.1.5 (2.1.10)] with dehned in (10) and A := (1 — ti.)X^ + rfcA* for 
any £ [ 0 , 1 ], we have 


V57fc+i (''')< (l-^fc)‘^7fe+i(^ )+'kkVlk+A^ )- 


k\ 'C;,;(l 'kk)yk-\-l 11 ,.'.^ ||2 


-||«fe+i -4+ill • (69) 


Since Ip is convex, we also have ipiX) < (1 — Tk)i>(X^) + TfeV'(^^) and A — A^ = (1 — Tfe)A^ + 
TfcA^ — A^ = Ti^{X^ — A^) due to (33). Combining these expressions, the dehnition d-y := (fiy +-!/>, 
( 68 ) and (|69l) , we can derive 


d7r+i(A'=+') < (l-rfc)d7<^,(A'=)+rfed^^,(A'=) + a(Afe+i_Afc,Afc_A*) 

- il|A'=+'-A'=|P + M^llA'=+i-A'=f - (1 - ruW-^\ 


^fc +1 


-^fc+1 I 


On the one hand, since is the solution of the first convex subproblem in 
its optimality condition, we can show that 

¥^7.71 (A") - ^bl = {X\Aul^y)-g{ul^,)-^k+ibu{nl+y,u<^) - ^£>1 
< -5(u'=+i) - f^\\-z’^f-^k+ibuA’^+\uc) 


SAD-MM 


- ^\\A{ul+,-- 


(70) 


using 


(71) 


^fc+i 


On the other hand, similar to the proof of Lemma we can show that 

v>( V) < (V, Bv'^+^-c}-h{v’^+A- ^ p - ^(5'=+!, Am'=+i +B( 2 r}^ -c) 

= (A'=,Bi)'=+i - c) - h{v^+A - ^||P+^P + ^^||P+i||Dfe. 


Combining (711 and (72) and noting that dy{-) := + tpi'), we have 

d7^,(A'=) < (P,P+i) -/(x'^+p - f p'^+ip - f IIPP -7fe+i&WCc) 


-^||A( 11 ^+ 1 -P+PP- 2 ^I 


^/c +1 


2 /c+ 1||2 


\\Dk + ^bl 


Next, using the strong convexity of bu with = 1, we can show that 

^IPJ+i -P+^P +7fc+i6w(P+\Sc) > ^pPi-Wcp. 
Combining ( ffo] ), ( [SS] ), ( [73| and ( ffi] ) , we can derive 

d7;^PP+P < (l-rPd^jV) + a(P+i-ApP-A*P 
_i||A'=+i-Vp + |^||P+i-Pp 

- rkf{x '^+^) + Tfc (Ap 5'^+p - ^ II £'=+1 P - ^ P'= P 

- ^Pfe+i-Scp - {l-n)r,y>^\\nl^y - 

+ A — 'kk){'yk—'Yk+i)bu{uk+i,uA + 


(72) 


(73) 


(74) 


( 75 ) 
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Let US define 


Rk ■■= -Mfe+if+ ^T-fe||Mfc+i-Uc|P 

-(1 - Tk){lk - 7fc+l)^^(“fc+l,'w'')- 


(76) 


From 


SAD-MM we have = —rjkZ^'^^ and = —p^z^. Using these expressions 


and (76) into (|7^ we can simplify this estimate as 

d7^,{A'=+i) < (1 - Tk)d^,{V^)+Tu{z^+\\l) - rfe/(x'=+i) - (i±^pfc+if 

- + l^||A*^+'-A'=f - - A'^f - Rk (77) 

+ + R^Dl- 

Using again the elementary inequality !^||ap + Kpp > p^Ha — &|P, under the condition 


7fc+i ^ 


(vk in (34), we can show that 

- A*^||^ + - A'^f - - A'^f > 0. 

^Vk ^Pk ^Ik+i 


(78) 


On the other hand, similar to the proof of Lemmaj^ we can show that —> 

_!Z ^£)2 Ugjjjg ^jjjg inequality, ( [7^ , and A^ = —^ 2 *, we can simplify ( ffT] ) as 


d7.g_.(A'=+l) < (1 -r,)d^PP) - -rfe/(i'=+p -Ti, (i + P 


fcfl II2 


Rk + (^ 


Vk'Tk _1_ T'fcpfc f)2 




Since Pk+i > (1 “ Tk)Pk due to (34), similar to the proof of (61) we have 


AGk > (l-rfc)d^, (Ap - (a'^+I) - Tkf{x^+^)- ^ {z^+\z^)- 


'^k /^k+l -k\ 


Pk 


2(l-rfc)^fc 


k ii£fc+i ||2 


(79) 


(80) 


Combining (79) and (80), we get 


2 5 [(5 + "P* - (v|)ft 1 f + 7 - (pio? + Tpsj). (81) 


Next, we estimate Rk defined by ( |76[ ) as follows. We define := — iic, dfe ■= upi — Uc. 

Using 6 ;^(u^_|_j^, m'^) < —-u'^p, we can write Rk explicitly as 


7 = (1 - T-fc)rfc||afc - dkf + ^IPfcP - (1 - '^k){^ - l) 7 -h||afc|| 


2Rk _ 
Tfc+ 


— ( 2 


(3/2-ri)“fc|| + '^k) \_3-2Tk + 7fc+i) 

Since 7^+1 > ( — 3 - 2 Tfc^ — \ (|34|), it is easy to show that R}^ > 0. In addition, by 

y3-(2-Lj^ )Tk j 1f 

(34), we also have {l + 2Tk)pk~ — d- Using these conditions, we can show from (81) that 

^Gk > —— ^^^^Dk > — ^74^ + Uj, which is indeed the gap reduction condition 


(35). 


□ 
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8 . 4.2 The proof of Lemma [1 Parameter updates 

Similar to the proof of Lemma|^ we can show that the optimal rate of {r^.} is 0{l/k). From the 
conditions ( |34[ ), it is clear that if we choose ;= then 0 < < | < 1 for k > 0. Next, we 

choose 7 fc-)-i := ^ j 7 fe- Then 7 ^, satisfies (34). Substitnting into this formula we 

have 7 fc+i = j 7fc- By induction, we obtain 'y^+i = |^. Now, we choose rp, := = 

2 \\Ap{k+ 3 ) ■ condition of (|^, we choose pk := = 2 ||A|P(n 3 )(/c+ 4 ) ■ 


Pk = 


To derive an update for ftp, trom the third condition of (34) with equality, we can derive 

6||A||"(fc+3) 


(l-'rfc)(l+ 2 Tfc)r;fc 


7i(/c+l)(fc + r0) 


< 


9||A||" 

57i(fc+r) 

6||A||"(fc+4) 


. We need to check the second condition 

6||A||"(/c+3) 


Pk +1 > (1 - rk)Pk in (@. Indeed, we have Pk+i = ^ = .^^(fc_/i)(fc+ro) ^ 

which is true for all fc > 0. Hence, the second condition of (34) holds. □ 

8.4.3 The proof of Theorem^ Convergence of Algorithm^ 

First, we check the conditions of Lemmaj^ From the update rule (36), we have rfo = and 

/3i = . Hence, 871 = 10||A|p77o > 2||A|pr;o, which satisfies the first condition of Lemma 


§ 


= = Pi. Hence, the second condition of Lemmaj^holds. 

Next, since Pk = 2 ||A||^(fc+ 3 )(fc+ 4 ) ^nd % = ^Uplk+s) ^ ^an derive 


(57i-2’)o||A||2)j;(, 


'^k 3k _j_ rkPk 


8 I 71 


< 


8 I 71 


8||H||2(fc+3){fc+4)2 8P||2(fc+3){fc+4) 


- (l-'Tfe) 


8 I 71 


8||H||2(fc+2)(fc+3)’ 


Substituting this inequality into (35) and rearrange the result we obtain 

Gfe+i(m'=+i) - ^ [Gkiw’^) - 8|U||2(fe+2K 


8||H||2(fe+2)(fc+3)J' 


By induction, we obtain Gk{w^) - 8 ||Ap\I+^Kfc+ 3 ) - - '^iQ\\A\f ] < 0 as long as 


Go(w°) < ^g ||^2 ■ Now using Lemmaj^ we have Go(w°) < 
G,(mfc) < - 


Vo r,2 _ 7 i 

— U f — - 


4 - 8||A||2-^/ < rellAll 


27'tiDj 


. Hence, 


16||A||2(fc+2)(fc+3) ■ 


Finally, by using Lemma Bwith := Pk < > and simplifying the 

results, we obtain the bounds in ( |37[ ). If we choose 71 := \\A\\ then, we obtain the worst-case 
iteration-complexity of Algorithm!^ is 0{e~^). □ 
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